Search CORE

40 research outputs found

A random matrix analysis and improvement of semi-supervised learning for large dimensional data

Author: Couillet Romain
Mai Xiaoyi
Publication venue
Publication date: 09/11/2017
Field of study

This article provides an original understanding of the behavior of a class of graph-oriented semi-supervised learning algorithms in the limit of large and numerous data. It is demonstrated that the intuition at the root of these methods collapses in this limit and that, as a result, most of them become inconsistent. Corrective measures and a new data-driven parametrization scheme are proposed along with a theoretical analysis of the asymptotic performances of the resulting approach. A surprisingly close behavior between theoretical performances on Gaussian mixture models and on real datasets is also illustrated throughout the article, thereby suggesting the importance of the proposed analysis for dealing with practical data. As a result, significant performance gains are observed on practical data classification using the proposed parametrization

arXiv.org e-Print Archive

HAL-CentraleSupelec

Consistent Semi-Supervised Graph Regularization for High Dimensional Data

Author: Couillet Romain
Mai Xiaoyi
Publication venue
Publication date: 13/06/2020
Field of study

Semi-supervised Laplacian regularization, a standard graph-based approach for learning from both labelled and unlabelled data, was recently demonstrated to have an insignificant high dimensional learning efficiency with respect to unlabelled data (Mai and Couillet 2018), causing it to be outperformed by its unsupervised counterpart, spectral clustering, given sufficient unlabelled data. Following a detailed discussion on the origin of this inconsistency problem, a novel regularization approach involving centering operation is proposed as solution, supported by both theoretical analysis and empirical results

arXiv.org e-Print Archive

HAL-CentraleSupelec

Hal - Université Grenoble Alpes

HAL-Rennes 1

Lead exposure assessment among pregnant women, newborns, and children: case study from Karachi, Pakistan.

Author: Cui Xiaoyi
Fatmi Zafar
Ikegami Akihiko
Kayama Fujio
Kobayashi Yayoi
Mise Nathan
Mizuno Atsuko
Sahito Ambreen
Takagi Mai
Publication venue: eCommons@AKU
Publication date: 01/04/2017
Field of study

Lead (Pb) in petrol has been banned in developed countries. Despite the control of Pb in petrol since 2001, high levels were reported in the blood of pregnant women and children in Pakistan. However, the identification of sources of Pb has been elusive due to its pervasiveness. In this study, we assessed the lead intake of pregnant women and one- to three-year-old children from food, water, house dust, respirable dust, and soil. In addition, we completed the fingerprinting of the Pb isotopic ratios (LIR) of petrol and secondary sources (food, house-dust, respirable dust, soil, surma (eye cosmetics)) of exposure within the blood of pregnant women, newborns, and children. Eight families, with high (~50 μg/dL), medium (~20 μg/dL), and low blood levels (~10 μg/dL), were selected from 60 families. The main sources of exposure to lead for children were food and house-dust, and those for pregnant women were soil, respirable dust, and food. LIR was determined by inductively coupled plasma quadrupole mass spectrometry (ICP-QMS) with a two sigma uncertainty of ±0.03%. The LIR of mothers and newborns was similar. In contrast, surma, and to a larger extent petrol, exhibited a negligible contribution to both the child’s and mother’s blood Pb. Household wet-mopping could be effective in reducing Pb exposure. This intake assessment could be replicated for other developing countries to identify sources of lead and the burden of lead exposure in the population

eCommons@AKU

Associations of food choice values and food literacy with overall diet quality: a nationwide cross-sectional study in Japanese adults

Author: Livingstone M Barbara E
Masayasu Shizuko
Matsumoto Mai
Murakami Kentaro
Sasaki Satoshi
Shinozaki Nana
Tajima Ryoko
Yuan Xiaoyi
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 05/04/2023
Field of study

Ulster University's Research Portal

miR-486-3p Influences the Neurotoxicity of a-Synuclein by Targeting the SIRT2 Gene and the Polymorphisms at Target Sites Contributing to Parkinson’s Disease

Author: Haili Huang
Hui Mai
Jianghao Zhao
Jingqi Yang
Keshen Li
Lili Cui
Pei Tang
Weihao Fan
Xiaohui Li
Xiaoting Chen
Xiaoyi Chen
Xiongjin Chen
Yan Wang
YuJie Cai
Yusen Chen
Publication venue: 'S. Karger AG'
Publication date: 01/12/2018
Field of study

Background/Aims: Increasing evidence suggests the important role of sirtuin 2 (SIRT2) in the pathology of Parkinson’s disease (PD). However, the association between potential functional polymorphisms in the SIRT2 gene and PD still needs to be identified. Exploring the molecular mechanism underlying this potential association could also provide novel insights into the pathogenesis of this disorder. Methods: Bioinformatics analysis and screening were first performed to find potential microRNAs (miRNAs) that could target the SIRT2 gene, and molecular biology experiments were carried out to further identify the regulation between miRNA and SIRT2 and characterize the pivotal role of miRNA in PD models. Moreover, a clinical case-control study was performed with 304 PD patients and 312 healthy controls from the Chinese Han population to identify the possible association of single nucleotide polymorphisms (SNPs) within the miRNA binding sites of SIRT2 with the risk of PD. Results: Here, we demonstrate that miR-486-3p binds to the 3’ UTR of SIRT2 and influences the translation of SIRT2. MiR-486-3p mimics can decrease the level of SIRT2 and reduce a-synuclein (α-syn)-induced aggregation and toxicity, which may contribute to the progression of PD. Interestingly, we find that a SNP, rs2241703, may disrupt miR-486-3p binding sites in the 3’ UTR of SIRT2, subsequently influencing the translation of SIRT2. Through the clinical case-control study, we further verify that rs2241703 is associated with PD risk in the Chinese Han population. Conclusion: The present study confirms that the rs2241703 polymorphism in the SIRT2 gene is associated with PD in the Chinese Han population, provides the potential mechanism of the susceptibility locus in determining PD risk and reveals a potential target of miRNA for the treatment and prevention of PD

Directory of Open Access Journals

Méthodes des matrices aléatoires pour l’apprentissage en grandes dimensions

Author: Mai Xiaoyi
Publication venue: HAL CCSD
Publication date: 16/10/2019
Field of study

The BigData challenge induces a need for machine learning algorithms to evolve towards large dimensional and more efficient learning engines. Recently, a new direction of research has emerged that consists in analyzing learning methods in the modern regime where the number n and the dimension p of data samples are commensurately large. Compared to the conventional regime where n>>p, the regime with large and comparable n,p is particularly interesting as the learning performance in this regime remains sensitive to the tuning of hyperparameters, thus opening a path into the understanding and improvement of learning techniques for large dimensional datasets.The technical approach employed in this thesis draws on several advanced tools of high dimensional statistics, allowing us to conduct more elaborate analyses beyond the state of the art. The first part of this dissertation is devoted to the study of semi-supervised learning on high dimensional data. Motivated by our theoretical findings, we propose a superior alternative to the standard semi-supervised method of Laplacian regularization. The methods involving implicit optimizations, such as SVMs and logistic regression, are next investigated under realistic mixture models, providing exhaustive details on the learning mechanism. Several important consequences are thus revealed, some of which are even in contradiction with common belief.Le défi du BigData entraîne un besoin pour les algorithmes d'apprentissage automatisé de s'adapter aux données de grande dimension et de devenir plus efficace. Récemment, une nouvelle direction de recherche est apparue qui consiste à analyser les méthodes d’apprentissage dans le régime moderne où le nombre n et la dimension p des données sont grands et du même ordre. Par rapport au régime conventionnel où n>>p, le régime avec n,p sont grands et comparables est particulièrement intéressant, car les performances d’apprentissage dans ce régime restent sensibles à l’ajustement des hyperparamètres, ouvrant ainsi une voie à la compréhension et à l’amélioration des techniques d’apprentissage pour ces données de grande dimension.L'approche technique de cette thèse s'appuie sur des outils avancés de statistiques de grande dimension, nous permettant de mener des analyses allant au-delà de l'état de l’art. La première partie de la thèse est consacrée à l'étude de l'apprentissage semi-supervisé sur des grandes données. Motivés par nos résultats théoriques, nous proposons une alternative supérieure à la méthode semi-supervisée de régularisation laplacienne. Les méthodes avec solutions implicites, comme les SVMs et la régression logistique, sont ensuite étudiées sous des modèles de mélanges réalistes, fournissant des détails exhaustifs sur le mécanisme d'apprentissage. Plusieurs conséquences importantes sont ainsi révélées, dont certaines sont même en contradiction avec la croyance commune

HAL-CentraleSupelec

Thèses en Ligne

HAL-Rennes 1

Methods of random matrices for large dimensional statistical learning

Author: Mai Xiaoyi
Publication venue
Publication date: 16/10/2019
Field of study

Le défi du BigData entraîne un besoin pour les algorithmes d'apprentissage automatisé de s'adapter aux données de grande dimension et de devenir plus efficace. Récemment, une nouvelle direction de recherche est apparue qui consiste à analyser les méthodes d’apprentissage dans le régime moderne où le nombre n et la dimension p des données sont grands et du même ordre. Par rapport au régime conventionnel où n>>p, le régime avec n,p sont grands et comparables est particulièrement intéressant, car les performances d’apprentissage dans ce régime restent sensibles à l’ajustement des hyperparamètres, ouvrant ainsi une voie à la compréhension et à l’amélioration des techniques d’apprentissage pour ces données de grande dimension.L'approche technique de cette thèse s'appuie sur des outils avancés de statistiques de grande dimension, nous permettant de mener des analyses allant au-delà de l'état de l’art. La première partie de la thèse est consacrée à l'étude de l'apprentissage semi-supervisé sur des grandes données. Motivés par nos résultats théoriques, nous proposons une alternative supérieure à la méthode semi-supervisée de régularisation laplacienne. Les méthodes avec solutions implicites, comme les SVMs et la régression logistique, sont ensuite étudiées sous des modèles de mélanges réalistes, fournissant des détails exhaustifs sur le mécanisme d'apprentissage. Plusieurs conséquences importantes sont ainsi révélées, dont certaines sont même en contradiction avec la croyance commune.The BigData challenge induces a need for machine learning algorithms to evolve towards large dimensional and more efficient learning engines. Recently, a new direction of research has emerged that consists in analyzing learning methods in the modern regime where the number n and the dimension p of data samples are commensurately large. Compared to the conventional regime where n>>p, the regime with large and comparable n,p is particularly interesting as the learning performance in this regime remains sensitive to the tuning of hyperparameters, thus opening a path into the understanding and improvement of learning techniques for large dimensional datasets.The technical approach employed in this thesis draws on several advanced tools of high dimensional statistics, allowing us to conduct more elaborate analyses beyond the state of the art. The first part of this dissertation is devoted to the study of semi-supervised learning on high dimensional data. Motivated by our theoretical findings, we propose a superior alternative to the standard semi-supervised method of Laplacian regularization. The methods involving implicit optimizations, such as SVMs and logistic regression, are next investigated under realistic mixture models, providing exhaustive details on the learning mechanism. Several important consequences are thus revealed, some of which are even in contradiction with common belief

Theses.fr

Semi-supervised Spectral Clustering

Author: Couillet Romain
Mai Xiaoyi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/10/2018
Field of study

International audienceIn this article, we propose a semi-supervised version of spectral clustering, a widespread graph-based unsupervised learning method. The semi-supervised spectral clustering has the advantage of producing consistent classification of data with sufficiently large number of labelled or unlabelled data, unlike classical graph-based semi-supervised methods which are only consistent on labelled data. Theoretical arguments are provided to support the proposition of this novel approach, as well as empirical evidence to confirm the theoretical claims and demonstrate its superiority over other graph-based semi-supervised methods

HAL-CentraleSupelec

Crossref

Hal - Université Grenoble Alpes

HAL Descartes

HAL-Rennes 1